Automatic syllable-pattern induction in statistical Thai text-to-phone transcription
نویسندگان
چکیده
This paper proposes a technique of automatic syllable-pattern induction in statistical Thai text-to-phone transcription. A general process of building a statistical text-to-phone transcription is to first define a set of rules describing syllable patterns, which is used for syllabification. Given an input text, the syllabification process generates all possible syllable sequences, which are then scored and selected using a statistical model. Updating the handcrafted rule set of syllable patterns is time-consuming and requires expert linguists. Instead of the manual process, automatic induction of new syllable patterns from a large raw text if proposed. The process that can deal with raw text is particularly needed for Thai as segmenting Thai text is a very tedious task. Experiments show that the proposed Thai text-to-phone transcription system after applying a large raw text for syllable-pattern induction achieves approximately 2% improvement. A comparison with other Thai text-to-phone transcription models and error analyses are also given in the paper.
منابع مشابه
A Unified Model of Thai Romanization and Word Segmentation
Thai romanization is the way to write Thai language using roman alphabets. It could be performed on the basis of orthographic form (transliteration) or pronunciation (transcription) or both. As a result, many systems of romanization are in use. The Royal Institute has established the standard by proposing the principle of romanization on the basis of transcription. To ensure the standard, a ful...
متن کاملA learning method for Thai phonetization of English words
This article tackles the problem of transcribing English words using Thai phonological system. The problem exists in Thai, where modern writing often composes of English orthography, and transcribing using English phonology results unnatural. The proposed model is totally data-driven, starting by automatic grapheme-phoneme alignment, modeling transduction rules and predicting Thai syllabictones...
متن کاملAutomatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Statistical studies based on automatic phonetic transcription of Standard Arabic texts are rare, and even though studies have been performed, they have been done only on one level – phoneme or syllable – and the results cannot be generalized on the language as a whole. In this paper we automatically derived accurate statistical information about phonemes, allophones, syllables, and allosyllable...
متن کاملDuration prediction using multi-level model for GPR-based speech synthesis
This paper introduces frame-based Gaussian process regression (GPR) into phone/syllable duration modeling for Thai speech synthesis. The GPR model is designed for predicting framelevel acoustic features using corresponding frame information, which includes relative position in each unit of utterance structure and linguistic information such as tone type and part of speech. Although the GPR-base...
متن کاملAnalysis and modeling of syllable duration for Thai speech synthesis
This paper describes the analysis results on the control factors of Thai syllable duration, and a statistical control model using linear regression technique. The analyses have been carried out both at a syllable level and at a phrase level. In a syllable level duration control, the effects of five Thai tones and syllable structures are investigated. To analyze syllable structure effects statis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006